Linear Thompson Sampling Revisited A Examples of TS distributions

نویسندگان

  • Marc Abeille
  • Alessandro Lazaric
چکیده

A Examples of TS distributions Example 1: Uniform distribution ⌘ ⇠ UBd(0,d). The uniform distribution satisfies the concentration property with constants c = 1 and c0 = e d by definition. Since the set {⌘|uT⌘ 1}\Bd(0, p d) is an hyper-spherical cap for any direction u of Rd, the the anti-concentration property is satisfied provided that the ratio between the volume of an hyper-spherical cap of height p d 1 and the volume of the ball of radius pd is constant (i.e., independent from d). Using standard geometric results (see Prop. 9), one has that for any vector kuk = 1 P(u⌘ 1) = 1

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linear Thompson Sampling Revisited

We derive an alternative proof for the regret of Thompson sampling (TS) in the stochastic linear bandit setting. While we obtain a regret bound of order e O(d3/2 p T ) as in previous results, the proof sheds new light on the functioning of the TS. We leverage on the structure of the problem to show how the regret is related to the sensitivity (i.e., the gradient) of the objective function and h...

متن کامل

Optimality of Thompson Sampling for Gaussian Bandits Depends on Priors

In stochastic bandit problems, a Bayesian policy called Thompson sampling (TS) has recently attracted much attention for its excellent empirical performance. However, the theoretical analysis of this policy is difficult and its asymptotic optimality is only proved for one-parameter models. In this paper we discuss the optimality of TS for the model of normal distributions with unknown means and...

متن کامل

MATHEMATICAL ENGINEERING TECHNICAL REPORTS Optimality of Thompson Sampling for Gaussian Bandits Depends on Priors

In stochastic bandit problems, a Bayesian policy called Thompson sampling (TS) has recently attracted much attention for its excellent empirical performance. However, the theoretical analysis of this policy is difficult and its asymptotic optimality is only proved for one-parameter models. In this paper we discuss the optimality of TS for the model of normal distributions with unknown means and...

متن کامل

Stochastic Regret Minimization via Thompson Sampling

The Thompson Sampling (TS) policy is a widely implemented algorithm for the stochastic multiarmed bandit (MAB) problem. Given a prior distribution over possible parameter settings of the underlying reward distributions of the arms, at each time instant, the policy plays an arm with probability equal to the probability that this arm has largest mean reward conditioned on the current posterior di...

متن کامل

Thompson Sampling for Linear-Quadratic Control Problems

We consider the exploration-exploitation tradeoff in linear quadratic (LQ) control problems, where the state dynamics is linear and the cost function is quadratic in states and controls. We analyze the regret of Thompson sampling (TS) (a.k.a. posterior-sampling for reinforcement learning) in the frequentist setting, i.e., when the parameters characterizing the LQ dynamics are fixed. Despite the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017